Image Captioning Model Using Part-of-Speech Guidance Module for Description With Diverse Vocabulary
نویسندگان
چکیده
Image captions aim to generate human-like sentences that describe the image’s content. Recent developments in deep learning (DL) have made it possible caption images for accurate descriptions and detailed expressions. However, since DL learns relationship between captions, constructs based on commonly frequented words dataset. Although these generated are highly accurate, they low lexical diversity, unlike humans due limited vocabulary. Therefore, this paper, we propose a Part-Of-Speech (POS) guidance module multimodal-based image captioning model determines intensity of word sequences generates through POS enhance diversity DL. The proposed enables rich expression by controlling information predicted predict words. Then, multimodal layer adds output vector Bi-LSTM using next caption, considering grammatical structure. We trained tested Flicker 30K MS COCO datasets compared them with current state-of-the-art studies. Also, analyzed Type-Token Ratio (TTR) confirmed several
منابع مشابه
Diverse Image Captioning via GroupTalk
Generally speaking, different persons tend to describe images from various aspects due to their individually subjective perception. As a result, generating the appropriate descriptions of images with both diversity and high quality is of great importance. In this paper, we propose a framework called GroupTalk to learn multiple image caption distributions simultaneously and effectively mimic the...
متن کاملGuided Open Vocabulary Image Captioning with Constrained Beam Search
Existing image captioning models do not generalize well to out-of-domain images containing novel scenes or objects. This limitation severely hinders the use of these models in real world applications dealing with images in the wild. We address this problem using a flexible approach that enables existing deep captioning architectures to take advantage of image taggers at test time, without re-tr...
متن کاملthe use of appropriate madm model for ranking the vendors of mci equipments using fuzzy approach
abstract nowadays, the science of decision making has been paid to more attention due to the complexity of the problems of suppliers selection. as known, one of the efficient tools in economic and human resources development is the extension of communication networks in developing countries. so, the proper selection of suppliers of tc equipments is of concern very much. in this study, a ...
15 صفحه اولText-Guided Attention Model for Image Captioning
Visual attention plays an important role to understand images and demonstrates its effectiveness in generating natural language descriptions of images. On the other hand, recent studies show that language associated with an image can steer visual attention in the scene during our cognitive process. Inspired by this, we introduce a text-guided attention model for image captioning, which learns t...
متن کاملImage Captioning using Visual Attention
This project aims at generating captions for images using neural language models. There has been a substantial increase in number of proposed models for image captioning task since neural language models and convolutional neural networks(CNN) became popular. Our project has its base on one of such works, which uses a variant of Recurrent neural network coupled with a CNN. We intend to enhance t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2022
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2022.3169781